We are going to use the Scarr-Rowe effect (or hypothesis) to look more closely on interaction effects. The Scarr-Rowe hypothesis states that the (genetic) heritability of a trait depends on the environment, such that the effects of genes are larger when environments are better. The underlying idea is that if everyone lives in a perfect environment, i.e. there is no variation in the relevant environment, then a trait will only depend on genes.
This interaction can be visualized as follows:
par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
heritability = c(scarce = 0.5, plentiful = 0.8)
barplot(heritability,
ylim = c(0,1),
xlab = "environment",
ylab = "heritability (effect of genes)")
Here is a DAG that describes such a model, where
and the Scarr-Rowe effect means the the coefficient of the path \(\small A \rightarrow IQ\) depends on \(\small SES\).
library(dagitty)
dag = dagitty(
"dag{
IQ;
A -> IQ;
SES -> IQ;
}")
coord.list =
list(
x=c(A=0,SES=0,IQ=1),
y=c(A=-.5,SES=.5,IQ=0))
coordinates(dag) = coord.list
drawdag(dag, cex = 1.5)
Note that DAGs do not encode interaction effects by drawing an arrow from the moderator to the relevant path. Instead, there is a path from the moderator to the outcome variable. So the DAG only tells us that IQ is a function of two other variables \(\small IQ = f(A,SES)\), but it does not tell us what the function \(f()\) is.
This function could be
or any other imaginable function that uses the variables. The model we have now in mind is:
\[ IQ = f(SES) A \]
Lets simulate data that show the Scarr-Rowe effect, first our exogenous variables. To keep things simple, we assume that all variables are close to normally distributed:
set.seed(25)
N = 250
SES = rlnorm(N,log(4.75),.3)
A = rlnorm(N,log(10),.2)
A and SES are independent
Now comes the interesting part: Scarr-Rowe assumes that the effect or weight of genes depends on the SES. So we formulate the weight of genes as as a function of SES.
library(boot)
# use inv.logit to give weights a lower and upper bound
# add a constant one to model that even at lowest SES
# there are genetic effects
b_A = function(x) 0.25 + inv.logit(x-5) * 2.75
We are literally defining the weight for A as a function of SES. This is one way to understand interactions. Here is a visualization of the weights, with a histogram of SES values on the bottom.
par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
curve(b_A(x),0,10,
ylab = expression("effect size"~beta[A]),
ylim = c(0,b_A(10)), xlab = "SES")
hist(SES,add = T, probability = T)
In this plot, the interaction is encoded by the fact that \(\small \beta_A\) has a slope.
Now we can simulate IQ values using
We assume that IQ depends on \(\small A\), whereby this effect depends on \(\small SES\): \(\small IQ = f(SES)A\). Lets simulate data from this model:
# Adding 85 to get an IQ around 100
IQ = 85 + b_A(SES)*A + rnorm(N,sd = 10)
par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
hist(IQ, main = "")
If we just look at the bivariate associations between \(\small A\) or \(\small SES\) and \(\small IQ\), we get the following plot:
To run a simple interaction with a categorical interaction variable
we dichotomise the SES variable to create the variable
SES.c:
dt = data.frame(
A = A,
IQ = IQ,
SES = SES,
SES.c = 1 + (SES > quantile(SES,.50))
)
Lets plot the association between \(\small A\) and \(\small IQ\) again, this time split by SES:
low.SES = dt$SES.c == 1
high.SES = dt$SES.c == 2
plot_data = function(dt) {
par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(IQ ~ A, data = dt, col = dt$SES.c, pch = 16, cex = .5,
ylim = range(dt$IQ), ylab = "IQ", xlab = "A")
legend("topleft", pch = 16, col = c("black","red"),
legend = c("low SES","high SES"), bty = "n")
}
plot_data(dt)
#abline(lm(IQ ~ A, data = dt[low.SES,]), col = "black")
#abline(lm(IQ ~ A, data = dt[high.SES,]), col = "red")
Next, we estimate some linear regression models with increasing complexity. Our goal is to have a model that accurately depicts the effect of A and SES on the IQ.
We start with a simple linear regression. But first some intuitions on priors:
a ~ dnorm(100,5)a ~ dnorm(0,6). This prior allows for the
possibility of a negative effect of \(\small
A\).dexp(0.25)Here is the regression model:
set.seed(1)
IQ.A =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + b*(A - 10),
a ~ dnorm(100,5),
b ~ dnorm(0,6),
sigma ~ dexp(.5)
),
data=dt)
Lets quickly look at the prior predictions to make sure the piors are OK.
prior = extract.prior(IQ.A)
A.seq = seq(from=0,to=20,length.out=50)
mu = link(
IQ.A,
post=prior,
data=data.frame(A=A.seq))
par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(
0,type = "n", ylab = "IQ", xlab = "A",
xlim = c(min(A),max(A)),
ylim = c(70,130))
matlines(
A.seq,t(mu[1:100,]),type = "l", lty = 1,
col = adjustcolor("blue", alpha = .5))
Yes, this looks good.
Here are the posterior predictions:
plot_data(dt)
plot_mu.CIs(IQ.A, data.frame(A=A.seq), "blue", spaghetti = TRUE)
This looks generally fine, but we are not capturing the group effects.
To fit a model with a main effect of education, we use the indexing approach:
set.seed(1)
IQ.A_SES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a[SES.c] + b*(A - 10),
a[SES.c] ~ dnorm(100,5),
b ~ dnorm(0,4),
sigma ~ dexp(.5)
),
data=dt)
plot_data(dt)
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 2), "red")
We can see already from the plot the the model with separate means for high and low SES is better. But here is the comparison with PSIS-LOO:
compare(
IQ.A,IQ.A_SES,
func = "PSIS") %>%
round(2)
## PSIS SE dPSIS dSE pPSIS weight
## IQ.A_SES 1879.00 22.33 0.00 NA 4.26 1
## IQ.A 1964.07 22.47 85.07 15.56 3.41 0
Lastly, we can also let the slopes vary by SES:
IQ.AxSES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a[SES.c] + b[SES.c]*(A - 10),
a[SES.c] ~ dnorm(100,5),
b[SES.c] ~ dnorm(0,2),
sigma ~ dexp(.5)
),
data=dt)
The key part of this model is that we are not specifying a main effect for A and an interaction effect with SES, but that we are estimating 2 regression coefficients, one for high and one for low SES. This simplifies putting reasonable priors on the effect in each group, but also implies that we do not put a prior on the difference between the two groups.
And again the posterior predictions:
par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot_data(dt)
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 2), "red")
Even though the DGP does not have different “intercepts” for high and log SES, we have to add it, because when the effect of A depends on SES, and A and SES are independent, we still expect children of high SES parents to have on average higher IQ.
Here is a comparison of all the models we have fit:
compare(
IQ.A, IQ.A_SES, IQ.AxSES,
func = "PSIS") %>%
round(2)
## PSIS SE dPSIS dSE pPSIS weight
## IQ.AxSES 1878.60 22.69 0.00 NA 5.46 0.63
## IQ.A_SES 1879.65 22.48 1.05 3.87 4.60 0.37
## IQ.A 1964.20 22.56 85.60 16.95 3.47 0.00
compare(
IQ.A, IQ.A_SES, IQ.AxSES,
func = "WAIC") %>%
round(2)
## WAIC SE dWAIC dSE pWAIC weight
## IQ.A_SES 1879.30 22.39 0.00 NA 4.43 0.51
## IQ.AxSES 1879.35 22.67 0.05 3.88 5.86 0.49
## IQ.A 1963.90 22.50 84.59 15.61 3.29 0.00
While the correct model has the best PSIS and WAIC value, for the top two models the differences are not large compared to the SEs of the differences.
How can we figure out if the difference in slopes is reliably larger than zero? We simply extract posteriors and calculate the difference in slopes from them:
# extract posterior
post = extract.samples(IQ.AxSES)
names(post)
## [1] "sigma" "a" "b"
head(post$b)
## [,1] [,2]
## [1,] 0.71811596 1.5048220
## [2,] 0.02823741 1.1257087
## [3,] 0.50066979 1.2945001
## [4,] 0.18805942 1.1241033
## [5,] 1.19198920 1.7680818
## [6,] 1.04307762 -0.4068256
We simply calculate the differences of the two b parameters:
delta.b = post$b[,2]-post$b[,1]
And now we can show e.g. mean and PIs:
c(mean = mean(delta.b),
PI(delta.b,prob = c(.9))) %>%
round(2)
## mean 5% 95%
## 1.03 0.00 2.07
And we can plot a histogram of the contrast:
par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
hist(delta.b, xlim = range(c(0,delta.b)), main = "")
abline(v = 0, lty = 2)
abline(v = PI(delta.b,prob = c(.95)), col = "red")
To see if our results are reasonable, lets compare our estimated parameters with the parameters governing the DGP. First the model parameters:
precis(IQ.AxSES, depth = 2) %>% round(2)
## mean sd 5.5% 94.5%
## a[1] 93.88 0.89 92.46 95.29
## a[2] 106.43 0.90 105.00 107.87
## b[1] 0.41 0.43 -0.27 1.09
## b[2] 1.43 0.46 0.70 2.16
## sigma 10.05 0.44 9.34 10.76
Now the parameters from the DGP:
rbind(
mean(b_A(SES[SES<quantile(SES,.30)])),
mean(b_A(SES[SES>quantile(SES,.70)])))
## [,1]
## [1,] 0.7264153
## [2,] 2.4808888
We are not recovering the exact parameters, after all we used a golem instead of a model that depicts the DGP, but qualitatively the results are consistent with the DGP.
Earlier, we described the function
\[ IQ = f(SES) A \]
By saying that the effect \(\small A\) is a function of \(\small SES\). However, we really just have two variables: \(\small A\) and \(\small f(SES)\) which are multiplied with each other. Therefore, these statements are equivalent:
Accordingly, we can also plot the difference between high and low \(\small SES\) as a function of \(\small A\):
A.seq = seq(from=5,to=16,length.out=50)
mu.low = link(IQ.AxSES,data=data.frame(SES.c=1,A=A.seq))
mu.high = link(IQ.AxSES,data=data.frame(SES.c=2,A=A.seq))
delta = mu.high-mu.low
CIs = apply(delta,2,PI)
par(mfrow = c(1,2), mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(A.seq, colMeans(delta),'l',
ylim = range(CIs),
col = "blue",
xlab = "A",
ylab = expression(SES-effect: IQ[high~SES]~-~IQ[low~SES]))
shade(CIs,A.seq, col = adjustcolor("blue", alpha = .25))
abline(h = 0, lty = 2)
A.seq = c(10,11)
mu.low = link(IQ.AxSES,data=data.frame(SES.c=c(1,2),A=10))
mu.high = link(IQ.AxSES,data=data.frame(SES.c=c(1,2),A=11))
delta = mu.high-mu.low
CIs = apply(delta,2,PI)
plot(c(1,2), colMeans(delta), pch = 16, cex = 3,
ylim = range(CIs), xlim = c(.8,2.2),
col = "blue", xlab = "SES", xaxt = "n",
ylab = expression(A-effect: IQ["A=11"]~-~IQ["A=10"]))
axis(1, at = c(1,2), labels = c("low","high"))
lines(c(1,1),CIs[,1], col = adjustcolor("blue",alpha = .5), lwd = 2)
lines(c(2,2),CIs[,2], col = adjustcolor("blue",alpha = .5), lwd = 2)
We are continuing with our simulated Scarr-Rowe data set, but this time we use the full data and formulate a continuous interaction.
dt = data.frame(
A = A,
IQ = IQ,
SES = SES)
You can try it the fancy way and make a 3d plot, but in this instance its a lot of effort for meager results:
library(plot3D)
## Warning: package 'plot3D' was built under R version 4.1.3
points3D(A,SES,IQ, type = "h",
col = "black",
cex = .75,
lty = 3,
pch = 16,
phi = 20,
theta = 45,
xlab = "A",
ylab = "SES",
zlab = "IQ")
A panel of 2d plots (small multiples) does the job better, and will later allow to display uncertainty.
qs = quantile(SES, probs = seq(0,1,.25))
par(mfrow = c(1,4), mar=c(2.5,2.5,2,.5), mgp=c(1.5,.5,0), tck=-.01)
for (k in 2:length(qs)) {
idx = which(dt$SES > qs[k-1] & dt$SES < qs[k])
tmp.dt = dt[idx,]
with(tmp.dt,
plot(IQ~A, pch = 16, main = paste0(k-1,". quartile SES"),
ylim = range(dt$IQ), xlim = range(dt$A)))
}
How many panels one uses depends on how one assumes the moderator
influences the effect of interest.
This first model assumes that there are just main effects of \(\small A\) and \(\small SES\). Indeed, this is not an unreasonable assumption if one remembers the raw data, which we show here again:
par(mfrow = c(1,2),mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(SES,IQ, pch = 16, cex = .5,)
plot(A,IQ, pch = 16, cex = .5,)
In preparation for the interaction model we are not centering \(\small SES\), but we are re-scaling it to have a minimum just above 0. If we would center to zero, below zero SES values would predict a negative additive genetic effect.
The simple linear model can be descried as follows (omitting indicators \(_i\) for individuals):
\[ \mu = \alpha + \beta_{A}A + \beta_{S}SES \]
IQc.A_SES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + ba*(A - 10) + bs*(SES - 2.5),
a ~ dnorm(100,10),
ba ~ dnorm(0,4),
bs ~ dnorm(0,4),
sigma ~ dexp(.5)
),
data=dt)
We are using prior predictions to check if the model does a good job:
plot.pred(IQc.A_SES,dt, type = "prior")
This is pretty wild: Each line represents the expected IQ given A, which means that we should not see lines that lie mostly below or above the data points. Such lines indicate that the prior for the intercept should be adjusted, in our case it should be narrower. Lets try a bit narrower priors:
set.seed(1)
IQc.A_SES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + ba*(A - 10) + bs*(SES - 2.5),
a ~ dnorm(100,7.5),
ba ~ dnorm(0,2),
bs ~ dnorm(0,2),
sigma ~ dexp(.5)
),
data=dt)
plot.pred(IQc.A_SES,dt, type = "prior")
These priors are reasonable, so we look at the predictions:
plot.pred(IQc.A_SES,dt)
As expected from the model we specified, all slopes are the same.
Note that the different “intercepts” we seem to come from this part of
the model: mu <- ... + bs*(SES - 2.5),.
If we want to model an interaction effect, we want to model that the effects of \(\small A\) and \(\small SES\) depend on each other. Broadly, we want to implement
\[ \gamma_A = f(SES) \]
if we assume the \(\small f(SES)\) is a linear function, we are saying that
curve(b_A(x),0,10,
ylab = expression("effect size"~beta[A]~" = f(SES)"),
ylim = c(0,b_A(10)), xlab = "SES")
hist(SES,add = T, probability = T)
\[ \gamma_A = \beta_A + \beta_{AS}SES \]
Here, \(\small \beta_A\) is the intercept for the effect of \(\small A\), i.e. the effect of \(\small A\) when \(\small SES = 0\). If we knew that the effect of \(\small A\) has to be zero when \(\small SES = 0\), we could omit the \(\small \beta_A\)
Now lets think back to our original regression, with slightly modified notations
\[ \mu = \alpha + \gamma_{A}A + \beta_{S}SES \]
We can just plug in \(\small (\beta_A + \beta_{AS}SES)\) in place of \(\small \gamma_a\):
\[ \mu = \alpha + (\beta_A + \beta_{AS}SES)A + \beta_{S}SES \]
and if we multiply out the brackets, we get
\[ \mu = \alpha + \beta_A A + \beta_{S}SES + \beta_{AS}SES \, A \]
which is the standard interaction model with 2 main effects.
The quap model is then similar to the previous, but adds
this interaction term: bas*(A - 10)*(SES - 2.5).
IQc.AxSES.m =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + ba*(A - 10) + bs*(SES - 2.5) + bas*(A - 10)*(SES - 2.5),
a ~ dnorm(100,7.5),
ba ~ dnorm(0,2),
bs ~ dnorm(0,2),
bas ~ dnorm(0,1),
sigma ~ dexp(.5)
),
data=dt)
plot.pred(IQc.AxSES.m,dt, type = "prior")
This priors look OK (even though slopes become extreme at high SES
values), so we plot the model predictions:
plot.pred(IQc.AxSES.m,dt)
compare(IQc.A_SES,IQc.AxSES.m, func = "WAIC") %>% round(1)
## WAIC SE dWAIC dSE pWAIC weight
## IQc.AxSES.m 1879.0 23.3 0.0 NA 5.7 1
## IQc.A_SES 1886.3 23.4 7.3 7 4.9 0
compare(IQc.A_SES,IQc.AxSES.m, func = "PSIS") %>% round(1)
## PSIS SE dPSIS dSE pPSIS weight
## IQc.AxSES.m 1878.9 23.4 0.0 NA 5.6 1
## IQc.A_SES 1886.1 23.5 7.2 6.9 4.8 0
PSIS and WAIC correctly identify the interaction model, and the difference is clearer compared to the categorical interaction, which used only a subset of the data and through way information by dichotomizing.
library(akima)
library(plotly)
mu = link(IQc.AxSES.m)
s = interp(dt$A,dt$SES,colMeans(mu))
names(s) = c("A","SES","IQ")
fig = with(
s,
plot_ly(x = ~A, y = ~SES, z = ~IQ,
width = 900, height = 900) %>%
add_surface(
contours = list(
z = list(
show=TRUE,
usecolormap=TRUE,
highlightcolor="#ff0000",
project=list(z=TRUE)
)
)
) %>%
add_markers(x = dt$A, y = dt$SES, z = dt$IQ, size = .5)
)
fig =
fig %>% layout(
scene = list(
camera=list(
eye = list(x=1.87, y=0.88, z=0.64)
)
)
)
fig
Giagrande 2019: Scarr-Rowe effect
No interaction here. We need some type of “and” statement for an interaction.
Intelligent animal species tend to be either highly social or have manipulative appendages (hands, tentacles, etc.).
Recall the tulips example from the chapter. Suppose another set of treatments adjusted the temperature in the greenhouse over two levels: cold and hot. The data in the chapter were collected at the cold temperature. You find none of the plants grown under the hot temperature developed any blooms at all, regardless of the water and shade levels. Can you explain this result in terms of interactions between water, shade, and temperature?
This is indeed a three way interaction. A quick way would be to simply write the model as
\[ mu_i = \alpha + \beta_{TSW}(T_i \cdot S_i \cdot W_i) \]
However, it is a good idea to generally model all lower level main and interaction effects
$$ i = \ + {T}T_i + {S}S_i + {W}W_i \ + {TS}(T_i S_i) + {SW}(S_i W_i)+ {TW}(T_i W_i) \ + {TSW}(T_i S_i W_i)
$$
The trick here is that we need - a binary indicator variable, for hotness, lets call it \(\small H\) that is \(0\) when it is hot and otherwise one \(1\). - an equation where all other effects interact with hotness.
Here is an example that works:
\[ \mu_i = (\alpha + \beta_{T}T_i + \beta_{S}S_i + \beta_{W}W_i) \cdot H_i \]
Here is an example that does not work:
\[ \mu_i = \alpha + \beta_{T}T_i + \beta_{S}S_i + \beta_{W}W_i\cdot H_i \]
If we have any terms in the model that are not multiplied by hotness, we cannot insure that \(\mu\) will be zero when it is hot. This is also true for a model where the intercept depends on hotness, for example
\[ \mu_i = \alpha_{[H_i]} + \beta_{T}T_i + \beta_{S}S_i + \beta_{W}W_i \]
will not work because \(\small \beta_{T}T_i + \beta_{S}S_i + \beta_{W}W_i\) can be non-zero when it is hot.